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Abstract 

Consider a rectangular matrix describing some type of communication or trans- 
portation between a set of origins and a set of destinations, or a classification of ob- 
jects by two attributes. The problem is to infer the entries of the matrix from limited 
information in the form of constraints, generally the sums of the elements over vari- 
ous subsets of the matrix, such as rows, columns, etc, or from bounds on these sums, 
down to individual elements. Such problems are routinely addressed by applying the 
maximum entropy method to compute the matrix numerically, but in this paper we 
derive analytical, closed-form solutions. For the most complicated cases we consider 
the solution depends on the root of a non-linear equation, for which we provide an 
analytical approximation in the form of a power series. Some of our solutions extend 
to 3-dimensional matrices. 

Besides being valid for matrices of arbitrary size, the analytical solutions exhibit 
many of the appealing properties of maximum entropy, such as precise use of the 
available data, intuitive behavior with respect to changes in the constraints, and logical 
consistency. 



1 Introduction 

Consider a set of n origins communicating with a set of m destinations. For our purposes it 
suffices that each origin is connected to each destination; the exact nature of the connection 
is not important. The communication may be in the form of transportation, e.g. the origins 
and destinations may be cities or other geographic locations, and people travel from one 
to another by some means, or commodities are transported from one to another in some 
fashion. Or the origins and destinations may be nodes connected by a communications 
network, with various sorts of traffic flowing from each source to each destination. In either 

* Email: koOresearch. att . com 



1 



of these cases, the transportation or communication can be represented by an rectangular 
n X m trip or traffic matrix whose i, jth entry gives the number of trips, volume of traffic, 
units of a commodity, etc. from the ith origin to the jth destination. (The distinction 
between origins and destinations is not mandatory; one could take n = m and think just 
of a set of n locations.) In a different setting, we have a set of objects with two attributes, 
say height and weight, color and shape, success/failure of a test and test condition, and 
the objects are placed in a table according to the n- valued first attribute and the m- valued 
second attribute. In this setting the nxm matrix is known as a (2-dimensional) contingency 
table whose (z,j)th entry is the number of objects whose 1st attribute has the ith value 
and 2nd attribute the jth value. 

Whichever of these two settings obtains, we are interested in the situation where we 
have limited or incomplete information about the matrix: we do not know the individual 
elements, but know less detailed characteristics such as the totals of the rows and/or 
columns, or of some of them, the total sum of the matrix, or we have bounds on some of 
these quantities, or in addition we know the values of some individual elements or have 
bounds on them. The problem then is how to infer all the matrix elements from this 
information, and, in this paper, we are interested in solving the problem analytically. It 
is well known how to find numerical solutions to these inference problems by numerical 
entropy maximization. 

Most likely matrices and maximum entropy We approach the problem by regarding 
the matrix as constructed from a known number of elements (trips, traffic units, etc), which 
we will think of as balls, to be placed into an n x m array of boxes. We will refer to the 
number of ways (assignments of balls to boxes) in which a given matrix X can be built 
as its number of realizations, ^{X). If the information / is known about X, we may also 
regard it as constraints that X has to satisfy, and we write i^{X\I) for the number of 
realizations of X that accord with / or satisfy the constraints /. For example, if what we 
know about the 2x2 matrix X, Xij £ N, is that its row sums are 7 and 3, some possibilities 
are 

^^ = (2 1)' ^'"(2 1)' ^'"(1 2)' ^'"(2 1)' ^'"(3 o)- 

In fact there are 8 possible 1st rows and 4 possible 2nd rows, so 32 matrices satisfy 
these constraints. Further, for the above examples, ^{Xi\I) = ^{X2\I) = ^{Xj,]!) = 
10!/(1!2!3!4!) = 12600, #{X^\I) = 10!/(5!2!2!l!) = 7560, #{X^\I) = 10!/(l!3!6!) = 840 0. 
We will refer to the matrix X for which ^{X\I), given by a multinomial coefficient, is max- 
imum, as the most likely matrix given the information/constraints I. "Most likely" may 
have probability connotations for some, but we use it only as a shorthand for "matrix that 

^We assume that the balls are distinguishable. The boxes are distinguishable, being particular elements 
of a matrix. 
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can be realized in the greatest number of ways" , which has nothing to do with probabiHty, 
it is merely counting. 

If the constraints are convex (in this paper they will be linear), and they specify the 
sum of all the elements, the discrete problem of maximizing ^{X\I) can be turned into a 
continuous concave maximization problem via the Stirling approximation to the factorial: 
the log of the multinomial coefficient is approximated by the entropy of the Xij. This 
continuous approximation is well-known, and in fact dates back to Boltzmann's (1847- 
1906) combinatorial formulation of statistical mechanics where molecules are assigned to 
boxes; see e.g. |Som67j . Thus our discrete most likely matrix problem connects to the 
extensive body of work on maximum entropy (MaxEnt): see the works jRos83] . |Jay03| of 
E. T. Jaynes, the books |Tri69] . |KK92| . and the series of MaxEnt conference proceedings 
jMAX98j and |MAX09P . to name a few. So, as long as the total sum is known, the discrete 
most likely matrix problem and its continuous MaxEnt analogue are equivalent to within 
the Stirling approximation, and we will sometimes refer to one, sometimes to the other. 

The combinatorial rationale that we consider here is appealing because of its simplicity: 
it is just counting. In addition, MaxEnt has intuitive appeal as maximizing uncertainty 
while conforming to precisely the available information. More importantly, it has a powerful 
axiomatic basis as well: see |Ski89j . and |CG06| for recent developments. 



Summary and background In this paper we derive analytical, closed-form solutions 
to a set of maximum entropy problems having to do with n x m matrices subject to linear 
constraints. The constraints have the form of equalities or inequalities (upper bounds) on 
sums over various subsets of the matrix, e.g. rows, columns, the whole matrix, the diagonal, 
individual elements, etc. In ^to ^we consider known row, column, and total sums, as 
well as upper bounds on them. We observe that when the total sum is not known, the 
most likely matrix is not the MaxEnt matrix, but it has a simple relationship to a certain 
MaxEnt matrix. In ^we consider upper bounds on row sums and on individual elements. 
Finally, in ^ we investigate the effect of having symmetric information in combination with 
bounds on sums and specified individual elements, including an extension to 3-dimensional 
matrices. Table ISTTl in ^summarizes the types of constraints that we consider. In the most 
complicated cases the solutions depend on the root of a single non-linear equation, but even 
in those cases we find an analytical power series approximation to the root, hence to the 
matrix elements themselves. The analytical forms allow us to treat matrices of arbitrary 
size, reveal the exact structure of the most likely/MAxENT matrix, and allow us to see 
explicitly the robustness of the solution to changes in the constraints, and its behavior with 
respect to uncertainty in the data. These features demonstrate the logical precision of the 
MaxEnt method and are inaccessible via numerical solutions. 

An extensive and in-depth study of MaxEnt matrices in transportation analysis is 
|ES90| : an introduction can be found in [KK92j . and a recent reference is |BD08j . Various 

^The latter with the unfortunate adoption of Microsoft Word for the typesetting of mathematical papers. 
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aspects of matrices characterizing traffic in IP networks, including numerical estimation 
from incomplete data, are studied in |A(JR+06| , |ZRLD05| . and the references thereii^l. A 
semi- analytical derivation of most likely traffic matrices subject to a total cost constraint 
is in |KU08j . With respect to contingency tables, |KK92j provides an introduction while 
[Goo63j derives fundamental results on the "vanishing of interactions" in MaxEnt multi- 
dimensional tables. For a small sample of other applications of MaxEnt see |CG02] and 
jSen91j (economics and econometrics), |KT92j and |KM093j (queueing problems), and 
|TJI02| (systems theory). 



2 Specified row sums and some column sums 

We begin by considering a small extension of a problem whose solution is already known 
in the literature in order to introduce the concepts and general methodology used in the 
rest of the paper. Phrasing the discussion in terms of an n x m matrix X describing the 
traffic from n origins to m destinations, suppose we have the following information (or 
constraints) / about it: 

1. The total traffic from each origin: Vi,^ - 

2. The total traffic to each of the first i ^ m destinations: Vj ^ i, Xij = Vj 

We assume that the information is consistent, i.e.^^tij ^ ^j'^j- This information also 
specifies the total traffic s in the network: s = ^iUi. 

To find the most likely traffic matrix X that follows from the information /, given that 
the sum of all the entries is s, we construct X by distributing the s units of traffic into nm 
boxes so that Xij of them go in box (i, j). The number of ways in which this can be done 
is 

1 = = I (2.1) 

Xii , . . . , X\jyi , X2\ , • • • , X2m J ■ ■ ■ 1 ^nl i ■ ■ ■ i ^nm J 1 1 j j Xij \ 

where the notation indicates that s is known. To render the maximization of ij^{X \ s) 
tractable, and, at the same time, achieve a relatively simple solution, we treat it as a 
continuous problem and maximize the log of #(X | s) using the Stirling approximation 

\nx\ = xlnx - X + - Inx + ln\/27r + — , i?g(0, 1), (2.2) 

which is defined for all x > by x\ = r(x + 1). Using the first two terms of (j2.2p and 
noting that ^ Xij = s is given, the problem becomes 

maximize — Xjj In Xjj , (2-3) 



^We say "estimation" because the methods used do not have the same logical standing as MaxEnt. 
Also, the problem of estimating traffic in a real IP network is significantly more complex than the problems 
considered in this paper. 
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subject to 

Xij = Ui for i = 1, . . . , n, Xij = Vj for j = 1, . . . , £. 

j i 

The expression to be maximized is the entropy of the set of demands Xij. (Usually, e.g. in 
information theory, entropy is defined for a vector whose entries sum to 1. What we use 
here is more properly referred to as combinatorial, as opposed to information, entropj|f|.) 
Because the entropy is a strictly concave function, the problem (12. 3p has a unique solution 
which can be found by forming the Lagrangean (details in ^A.ip 

$ = - ^ Xij In Xij -^Xi ^iJ -^i) -Yl {Yl ~ ^j) • 

i,j i j j i 

It follows that Xij = e~^^~^^^~^ if j ^ £ and e~'^^~^ if j > i. Denoting e~'^'^^ by A- and 
e~^3 by fMj, and then eliminating the primes to simplify the notation, 

^.. = { ' ^-5^ , \,/x,>0. (2.4) 



The origin and destination constraints imply that 



Ai(/Ui H \-fi£ + m-£) = Ui, i = l,...,n 

(Ai H hA„)/ij = Vj, j = !,...,£. 



(2.5) 



Adding the first set of constraints together and doing the same with the second set we get 

( Ai H h An) (/ii H \- fie + m-£) = ui-\ \-Un = s and (Ai H h A„,) (/ii H \- fii) = 

vi + ■ ■ ■ + Vi- If we now let A be the sum of the Aj and that of the fXj, it follows that if 
i < m 

s - (wi H \-Vi) {n-i){vi-\ \-ve) 



^ = -« ' 



So from (I23D . 



m — i ' s — {vi + ■ ■ ■ + Vf) 



_{s-{vi + --- + Vi))ui _ {m-£)vj 

(n — t)s s — [vi + ■ ■ ■ + vg) 



Using this in ()2.4p . we finally get 



UjVj 

1 

s-{vi + --- + vi)u, . ^ i = l,...,n. (2.7) 



m — £ 



*This usage goes back to Boltzmann's combinatorial formulation of statistical mechanics, see [Som67| . 
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Now ii i = m, (j2.5p and the two equations following it become Aj/x = Ui, HjX = Vj, Xfj, = s, 
from which it follows that XifXj = UiVj/s; thus (j2.7p is valid even when i = m. Therefore 
the most likely matrix X consists of an n x £ left-hand part whose entries are given in 
the 1st line of (j2.7p . and a possibly empty right-hand part consisting oi m — i identical 
columns, each of which is described by the 2nd line of (j2.7p . 

The solution Xij = mvj/s for all i,j is known as the gravity model for the traffic. This 
model has its origins in transportation analysis, in connection with the numbers of trips 
taken between n origins and m destinations, which are cities of known populations; X 
is then referred to as a "trip matrix". See |KK92] for an introduction, and |ES90j for 
an in-depth treatment. In the context of contingency tables this model is known as the 
"independence model" under marginal constraints. An important generalization to multi- 
dimensional ni X n2 X ■ ■ ■ contingency tables is given in the classic paper [Goo63j of I. J. 
Good. 

The form of X is conceptually robust. For example, take the model with n = m = i 
and suppose the destination constraints are removed. Then all the fi'j in (12. 4j) can be 
taken equal to 1, and the solution i/n, Vj. Similarly, if the source constraints 

are removed, xij = Vj/n, Vi. And if both types of constraints are removed leaving just 
Ylij^ij ~ then Xij = s/n'^. We see that MaxEnt yields independence and as much 
symmetry/uniformity as possible, subject to the given information. 

3 Bounds on row sums 

Suppose that the only information we have on the nxm matrix X is upper bounds on the 
row sums: 

\/i, '^Xij ^ Ui. (3.1) 

j 

We will first show that with Xij G N, the most likely matrix X has its row sums in fact 
equal to ui, . . . Indeed, let X be a matrix satisfying the constraints (jS.ip . and with 
^ij ~ Suppose that row i sums to strictly less than Uj. This means that there is a 
j such that if we increase Xij by 1, the resulting matrix X' also satisfies the constraints. 
By (f2T]l . X' is more hkely than X: 

#m ^ {s + iy. xi,\ ^ s + i ^ 

s! {xij + l)\ Xij + 1 

Proceeding in this way we can keep increasing the elements of the matrix while also in- 
creasing the value of until all constraints are satisfied with equality and the rows 
sum to exactly ui, . . . , This reduces the problem to the one considered in[2l where the 
total demand from each origin is known (as well as the total demand in the whole network). 
So the solution to (13. ip is simply 

Vi, Xij = —. (3.2) 
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This answer depends exactly on the given information and on nothing else. The argument 
we gave above also shows that lower bounds on the row sums are immaterial. 

Example 1 Suppose we have a 10 x 10 matrix, and the upper bounds on the row sums 
are ui, . . . , uio = 20, 20, 24, 30, 30, 36, 36, 36, 36, 40, measured in some units. We then find 
that the total number of matrices that accord with the information / is 

M{I) = 30045015^ • 131128140 ■ 847660528^ • 4076350421'' • 10272278170 « 2.41 • 10^^. 

(The number of solutions in N of the equation x + X2 + ■ ■ ■ + xio = Ui is simply the number 
of compositions of Uj into 10 parts, equal to ("'^^)- And the inequality version can be 
handled by summing (^g^) over ^ b ^ Uj.) The most likely matrix X is one of these 
2.4- 10^^ matrices. We can also find the number of matrices that satisfy (j3.ip with equality. 
This turns out to be 

M{I=) = 10015005^ • 38567100 • 211915132^ • 886163135'' • 2054455634 2.20 • 10^^ 

and X is one of these matrices. By ()2.ip and ()3.2p . X can be realized in i^{X) = 
308!/((2!)22.4!(3!)2(3.6!)^4!)^° « 1.46-10^^^ ways, where we took some liberties by allowing 
non-integral entries. 

How much more likely is X than a matrix X' which also obeys / and is the same as 
X except that its 5th row is (2,2,2,2,2,4,4,4,4,4), a slight deviation from (3,. ..,3)? We see 
that #(X)/#(X') = (2!)5(4!)V(3!)^° ^ 4.21. If row 8 is (2,2,2,2,2,2,2,6,8,8) instead of 
(3.6,...,3.6), a larger deviation, the likelihood of X' is significantly smaller: ^{X)/^{X') = 
(2!)'^6!(8!)V(3.6!)^o « 813.9. Note that the units chosen for the affect the size of the 
absolute numbers above, as well as the ratios; choosing finer units increases both the 
numbers and the ratios dramatically. For example, if all the Ui are multiplied by 10, the 
two likelihoods computed above become 1.8 • 10"^ and 4.2 • 10^^^ 

4 Total sum and bounds on row sums 

Now suppose that besides the upper bounds on the row sums we also know the total sum 
s: 

^^Xjj = s, and Vi, ""^^^ij ^ Ui. (4-1) 
i,j j 
For a solution to exist, we must have s ^ ui + ■ ■ ■ + By Corollary lA. II we then have 

Xij = Xifi, < A, 1. (4.2) 

To proceed, we consider the solution to a simpler problem: given a, 6i, . . . , 6„ > 0, what is 
the maximum entropy vector x* satisfying xi + ■ ■ ■ + Xn = a and Vi, Xi ^ ftj? 
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4.1 The vector case 

Lemma 4.1 The maximum- entropy vector x* satisfying '^^Xi = a and \/i ^ Xi ^ hi, 
where a ^ 6i + • • • + 6n, is found as follows: 

i. Arrange the hi in increasing order, and permute the Xi accordingly, 
a. Find the largest j E {0, . . . , n} for which 6i + • • • + 6j + (n — j)bj ^ a. Let that he k. 
Then xl = bi,x* = 63, . . . ,x* = 6,. and = • • • = = ^^^i^i±^. 

The starting point for this resuh is noting that if the hi are in increasing order, there 
is a unique i s.t. 61 + • • • + 6^ < a ^ 61 + • • • + bi+i. If so, a plausible high-entropy solution 
is to set the first i of the Xi (constrained to be smallest) equal to their upper bounds, and 
split the remainder of a, which does not exceed 6^+1, equally among the rest of the Xi, 
which are the more loosely constrained. Lemma l4.ll refines this idea: to actually achieve 
maximum entropy, only the first k < £ of the Xi can be set to their upper bounds. 

The significance of k is as follows. Suppose 61 > a/n; then k = 0, and h2, ■ ■ ■ ,hn are 
also > a/n. This means that the bounds on the Xi are loose enough to allow complete 
symmetry /uniformity: the MaxEnt solution is = • • • = x* = a/n. Now suppose that 
hi ^ a/n and 61 + (n — 1)62 > a, in which case k = 1. Then the bound 61 is restrictive 
enough to break the symmetry: the solution is xi = 61, X2 = • • • = x„ = (a — 6i)/(n — 1), 
symmetric apart from xi. So, in general, k measures how many of the constraints on the 
individual Xi are informative, i.e. force the solution away from the total uniformity that 
would have obtained if only the constraint xi + ■ ■ ■ + Xn = a had been present. Finally, 
k = n iE hi + ■■■ + hn = a. In that extreme, the solution is determined completely by the 
upper bounds: x* = {hi, . . . , hn). 



4.2 Back to the matrix 

Returning to the solution (j4.2p , we proceed along the lines of the proof of Lemma 14.11 in 
the Appendix. We treat the Ui as the hi of the lemma: arrange the rows of X so that 
^^1 ^ "^2 ^ • • • ^ Un, and find the largest k s.t. 

ui + • • • + Uk + {n — k)uk ^ s. (4.3) 

It may be that /c = 0, i.e. ui > s/n, but k cannot exceed n. As pointed out above, the 
number k measures how many of the row constraints are informative. Now consider the 
solution 

^xij = ui, . . . ,^Xfcj = Ufc, Afc+i = • • • = A„ = 1. (4.4) 

j j 

From (14. 2p . this implies that for all j, Xk+ij = • • • = Xnj = IJ-- Since the sum of all Xij 
must be s, 

- {ui-\ \-Uk) 



ui + ■ ■ ■ + Uk + {n — k)mii = s, so n 



m{n — k) 
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Using this in (jO|) . 



A,; 



{n — k)ui 



h 



s - (ui H h life) 

and we must verify that Aj ^ 1. But this holds if s > ui + • • • + + (n — k)ui, which is 
true for any i because of (j4.3p . 

In summary, with the rows of X arranged so that ui ^ U2 ^ ■ ■ ■ ^ Un, the solution is 



m 



s - (ni H h Uk) 

m{n — k) 



i ^ k, 



i > k, 



j = l,...,m 



(4.5) 



where k is determined by (|4.3p . The non- informative u^+i, . . . ,Un do not appear. 

According to (j4.5p . X consists of k identical columns with the structure specified in 
the 1st line, followed hy m — k identical columns with the structure specified in the 2nd 
line. Within each set, the columns are identical because we do not have any information 
that imposes a distinction. We also note that if s = X^jUj, then k = n — 1, and we obtain 
the solution ()3.2p . as expected, since this value of s imposes no additional constraint. If 
s/n ^ minjtij, the matrix is totally uniform: Xij = s/n?. 

Finally, the solution (j4.5p translates immediately to the case where we have bounds on 
the columns, instead of the rows of the matrix. 

Example 2 We re-do Example [U adding information on the total sum s. Here ^^Ui = 
308. We see from the last column of the table that #{X \ s) increases with s, as intuitively 
expected. 



s 


k 


Xi. 










X5- 








XlQ. 


logio#(X|s) 


308 


10 


20 


20 


24 


30 


30 


36 


36 


36 


36 


40 


549.2 


307 


9 


20 


20 


24 


30 


30 


36 


36 


36 


36 


39 


547.3 


304 


9 


20 


20 


24 


30 


30 


36 


36 


36 


36 


36 


541.8 


303 


9 


20 


20 


24 


30 


30 


35.8 


35.8 


35.8 


35.8 


35.8 


539.9 


275 


5 


20 


20 


24 


30 


30 


30.2 


30.2 


30.2 


30.2 


30.2 


487.2 


274 


5 


20 


20 


24 


30 


30 


30 


30 


30 


30 


30 


485.3 


273 


3 


20 


20 


24 


29.86 


29.86 


29.86 


29.86 


29.86 


29.86 


29.86 


483.4 


272 


3 


20 


20 


24 


29.71 


29.71 


29.71 


29.71 


29.86 


29.71 


29.71 


481.5 



Table 4.1: Row sums of the most likely 10 x 10 matrix X as a function of s. Within a row, 
all elements are equal. The stepwise line inside the table indicates the fc-boundary. The 
last column of the table is computed by ()2.ip . 
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4.3 Bounds on total sum and on row sums 

Suppose that instead of knowing the total sum as above, we have only an upper bound u 
on it: 

Xjj ^ n, and Vi, Xjj ^ Uj. (4.6) 

What has already been said in this section suffices to solve this problem also. First, if 
u > "^iUi, then this constraint is immaterial and we have the problem of ^ whose 
solution is given by (|3.2p . So we are left with the case u ^ Uj . Suppose that we pick 
a value s < u for the total demand, and then find X as in ^4.2[ Example [2] showed that 
i^{X I s) increases as s increases, suggesting that we should reduce to the problem (j4.ip 
with s = u. Indeed, Lemma lA. II establishes this formally. 

This is the first case where "most likely" is not equivalent to "having maximum en- 
tropy" . However, we see that there is still a strong and simple connection: the most likely 
matrix is the MaxEnt matrix with the largest total sum allowed by the constraints. 



5 Bounds on row and column sums 

Here we consider the situation where our information / consists just of upper bounds on 
both the row and column sums of the matrix: 



,n. 



The number of realizations of a matrix subject to this information is given by expression 
()2.ip . except in this case the total sum s is not known and has to be substituted by ^ Xij. 
If we use the first two terms of ()2.2p to approximate the log of 

-^^J^^J^ ( + •••-!- Xnm 

\ 2^11 , ... , Xlm , 2^21 ; . . . ; X2rm ■ ■ ■ ^ Xnl , . . . , Xnn 

we find that it is given by the "entropy difference" function 

^iX) = \y2ij Xijj Inf Xjj j — Xij — Yliji^ij In Xjj — Xij) 



S j,j ^ij j In ( ^ij j ^ij In Xij . 



(5.1) 



(When / includes the value of ^ijXij, maximizing G{X) subject to / is equivalent to 
maximizing H{X) subject to /.) Proposition IA.2I in the Appendix shows that G{X) is 
concave over the domain Xij > 0. And by Corollary IA.2i the elements of X have the form 

Xij = Xki)XifJ-j, Xi,fj.j £ {0,1]. (5.2) 

k,l 

Given the above, we note that there are two cases to consider w.r.t. to the bounds: 
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1. All rows sum to their bounds, and all columns sum to their bounds. 

2. At least one row or one column sums to less than its bound. 

Case 1 is possible only when X^jtij = ^jVj. If so, the solution s.t. V«, and 
Vj, Xij = Vj has been discussed in ^ Thus we need only consider case 2. We can 
establish the following property of X: 

Proposition 5.1 The matrix X is s.t. for any i,j pair, either row i sums to Ui, or column 
j sums to Vj. That is, there can be no pair i,j s.t. row i sums to < Ui and column j sums 

to < Vj. 

By virtue of Proposition 15.11 if one column of X sums to less than its bound, then all 
rows must sum to their bounds. The situation is symmetric w.r.t. rows and columns, so 
we will analyze just the column case, where one or more columns sum to less than their 
bounds. 

So suppose that columns 1, . . . ,k sum to their bounds, while columns k + 1, . . . ,m sum 
to less than their bounds, with ^ k < m. Then we must have v = Vj > ^ • Ui = u. 
By Corollary lA. 21 fik+i = ■ ■ ■ = fim = 1 in (15. 2p . Also, as pointed out above, all rows must 
sum to their bounds, which implies that ^ x^i = u. 

If we consider the columns, (15. 2p says that Xij = uXifij for j ^ k, and Xij = uXi for 
j > k. Adding these by sides over i we obtain 

Vj = uXfij, j ^ k and Vk+i > uX, . . . ,Vm > uX, (5-3) 

where A is the sum of the Aj. Further, if we add all the columns, vi + - ■ ■+Vk+uX+- ■ ■+uX = 
u, whence 

u-{vi + --- + Vk) , . 

X = . (5.4 

[m — k)u 

(A = 1/m if k = 0, i.e. if all columns sum to less than their bounds.) Turning to the rows, 
we have uXii-i = ui, . . . , uXnfi = Un, where fi is the sum of the fij. Thus 

AjU = — = n, and Xu, = 1. (5.5) 
u 



We can now determine all the Aj and /x^: from (j5.5p and (j5.3p . 

A. = Ar. and = { f/'^^^/^^ J (5.6) 

Note that neither Aj nor fj,j depend on v^+i, . . . ,Vm- This means that we can take these 
bounds to be as large as we please, e.g. oo, thus handling the case where no upper bound 
is specified for some of these columns. 
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It remains to take care of the fact that (j5.2|) requires Aj, /Uj ^ 1. It is easy to verify this 
for Aj! it is the product of two factors, both < 1. The condition fij ^ 1 is equivalent to 
(m — k)vj ^ u — {vi + ■ ■ ■ + Vk) for j ^ k. The inequahties in (j5.3p impose another condition 
on k: (m — k) min(?;fc_|_i, . . . , Vm) > u — {vi + - • •+ Vk)- Taking these two conditions together 
we see that k and the column bounds Vj must satisfy 

max(i;i, . . . jVfc) ^ < mm(i)fc+i, . . . , -t;^), 

m — k 

where ^ A; < m. Assume that the are in increasing ordeiH. Then this condition 
becomes 

^ + < ..+1, < r.. (5.7) 

m — k 

The following result establishes the existence of a A; satisfying (j5.7p : 

Proposition 5.2 Let v\ ^ V2 ^ ■ ■ ■ ^ Vm, u < v, and vi ^ u/m. Then there is a unique 
G {1, . . . , m — 1} s.t. 

{m-k)vk ^ u-{vi^ Vvk) < {m-k)vk+i. 

If "fi > u/m, then k = 0. 

Finally, if u < u, by (|5.2|) . (|5.4|) . and (|5.6|) . the elements of X are 

UiVj . 

u^-{vi + --- + Vk) Uj . i = l,...,n (5.8) 

; ) J > 

m — k u 

where k is given by Proposition 15.21 We note that k is the number of informative column 
constraints, in the sense that the solution depends onvi, . . . ,Vk but not on ffc+i, . . . (similar 
to the k in Lemma l4.ip . In fact, some of Vfc+i, . . . ,Vm may be infinite, i.e. there may be 
no upper bounds on some of columns A; + 1, . . . , m. 

The reader may want to compare (|5.8p with the result (j2.7p for the model of ^ The 
comparison shows that even though for the problem we just solved "most likely" is not di- 
rectly equivalent to "having maximum entropy" , there is still a straightforward connection 
as we also saw in ^4.31 



6 Bounds on individual elements 

We first point out that whereas bounds on individual matrix elements provide the utmost 
flexibility in expressing constraints, they can have unintended consequences. Then we look 
at the most likely matrix subject to bounds on the row sums and on individual elements. 

®If the matrix is a contingency table, simply rearrange the columns. If it refers to n nodes, re-label the 
nodes. 
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6.1 Expressive power and consistency 
6.1.1 Expressive power 

Consider finding the most likely matrix X subject just to the constraints 

Vz , j Xij ^ ^ij ? 

where W is a given n x m matrix in N. Then it is easy to see by an argument similar 
to that of ^that X has elements Xij = Wij. Thus the information W suffices to specify 
any possible most likely matrix. Conversely, to be able to specify an arbitrary matrix, 
information on every matrix element is necessary; W is one form of such information. 



6.1.2 Consistency 

Imposing w-constraints that are satisfied with equality requires that the w- and n-constraints 
together satisfy certain conditions if the matrix X is not to exhibit surprising behavior. 
For example, suppose we are trying to determine a 3 x 3 matrix with row/column sums 
ui,U2,u-s and s.t. xn = 0, as shown in Fig. 16. H left. Then we must have xi2 + xi^ = ui 
from row 1, and {x22 + 3^23 + 3^32 + X33) + X12 + 2:13 = U2 + U3 from columns 2 and 3. It 
follows that if ui is not strictly less than U2 + u^, then 2:22 + 2^23 + X32 + 2:33 = 0, which 
means that all these elements are 0. So with certain ui,U2,U3, xn = may force other 
elements of the matrix to be as well. This does not happen without the requirement 
xii = 0: we know from ^that for any ui,U2,U3 there is a X with all elements non-zero. 






X12 


Xl3 


Ui 


2:21 


X22 


X23 


U2 


X31 


X32 


X33 


U3 


Ui 


U2 


U3 





w 


A 


B 


C 



Ul 

Uk+1 
Un 



Ul---Uk Uk+1 ■■■Ur, 



Figure 6.1: Matrices with some elements fixed by ^-constraints. 

More generally, suppose we have a constraint that forces a certain kxk square submatrix 
of X to equal a matrix W. In the simplest case let W be in the upper left-hand corner of 
X as shown in Fig. 16. H right. Then we have 

T,w + ^A = Ui-\ hUfc, T,A + ^C = Uk+l-\ \-Un, 

assuming that Sty < ui + ■ ■ ■ + Uk- It can be seen that unless u and W are s.t. ui -|- • • • -|- 
Uk — Siy < Uk+i -|- • • • -|- Un, we must have Sc = 0, which would force the entire submatrix 
C to be 0. 
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If W is an arbitrary submatrix, let its rows and columns correspond to a set / of indices. 
Then the condition that must be satisfied so that C is not forced to can be written as 



^ u wii 

where the subscripts indicate summation over the set. In terms of a traffic matrix, this 
condition says that the traffic originating in the set / must be less than half of the total, 
plus half of the traffic originating in / and terminating in /. As an example, suppose we 
require that there is no traffic among the locations in /; then (|6.1|) says that the traffic 
that leaves / cannot be more than half of the total traffic. 

Related to the above, there is also a necessary and sufficient condition for the existence 
of a non-negative matrix with specified row and column sums and an arbitrary subset of 
elements specified to be 0: see Theorems 3.10 and 3.12 in Ch. 4 of |BP94j : see also §3.6 of 
pS90] . 



6.2 Bounds on row sums and on individual elements 

Suppose we know the same bounds on the row sums as in 3, but, in addition, we have a 
bound on the size of each individual element: 

\/i ^ Uj, and Vi, j Xij ^ Wij. (6-2) 

j 

This problem is easy to solve because the constraints (16. 2p are separable, so each row of the 
most likely matrix X can be found independently of all the other rows. Fixing a particular 
row i, denote the Xij by xi, . . . , Xm, Ui by a, and the Wij by 6i, . . . , bm- Then we have the 
problem of finding the most likely vector x* that satisfies 

Xi-\ \-Xm^a, Xi ^bi,. . . ,Xm ^bm, O, 6j G N. (6.3) 

The solution to this problem is as follows. If a > 6i + • • • + bm, x* is simply . . . , bm)- 
If a ^ bi + ■ ■ ■ + bm, then x* is found by replacing the inequality with an equality and 
reducing to the problem solved in HI The formal details are given in Lemma lA.ll in the 
Appendix. 



7 Symmetric information 

We now investigate some types of constraints that we have not looked at so far, but under 
the additional assumption that these constraints or information are symmetric w.r.t. rows 
and columns. By necessity, the matrices are n x n square. Here is one motivation for 
considering symmetry. Suppose we are designing a "backbone" type, i.e. high capacity 
and geographically extensive, communications network connecting a set of n locations. The 
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transmission facilities in such a network typically have the same capacity in each direction. 
To understand what capacities are needed for the links, we need an estimate of the traffic 
matrix. Given the symmetry of the capacities, we may assume, for example, that the total 
incoming and outgoing traffic for a given node are equal. The same considerations apply 
to a network of roads connecting a set of cities. Thus the symmetry of capacities allows us 
to act as if the traffic matrix were symmetric. 

These considerations aside, the symmetric information allows us to go farther toward 
analytical solutions than would be possible otherwise. One of the questions we investigate 
via the analytical forms is the effect of fixing some elements on the "product of independent 
factors" structure of the MaxEnt matrix. 

7.1 Total sum and bounds on row and column sums 

Here the sum of row i is bounded by Uj, and so is the sum of column i. By Corollary lA.il 
the matrix elements are of the form 

Xij = XiUjV, Xi,fij G (0, 1], 

where Aj , /ij correspond to the row and column constraints respectively, and z/ to the 
constraint on the total sum. We will show that the solution is essentially the same as that 
obtained in §4] for the non-symmetric, row-only case. So define k by (14. 3p . and consider 
the solution 

1. Constraints 1, . . . ,k are satisfied as equalities for both rows and columns (so we must 
have Ai, . . . , Afc ^ 1 and fii, . . . , fi^ ^ 1)5 and 

2. Afc+i = • • • = A„ = 1, and /ifc+i = •••=//„ = 1. 
It follows that the matrix must look like 



[Ai/ijl^]fcxfc [Xil^]kxn-k 




' A 


B ' 


[fJijl']n~kxk [l^]n—kxn—k 




C 


D 



Note that rows A; -|- 1, . . . , n are identical, and so are columns -|- 1, . . . , n. Let S^, . . . , S/) 
denote the sums of the elements of the submatrices. Clearly, 

Sa + I;b = uiH Vuk, -fSc = tti H hufc, Sa + + Sc + = s. 

Therefore = Sc and ui + -- - + Uk + 'i^B + '^D = s. Substituting for the elements of B 
and D, we find that 

_ g - (tti H h Ufc) 

^ ~ {n-k){\l^ + \k + n-ky 

And from = Sc it follows that Ai + • • • + A^ = /ii + • • • + /U^. Now the constraint on 
row i < k \s Aj(Ai + ■ ■ ■ + \k + n — k)v = Ui. Using the expression for v in this we find 
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that Aj = (n — k)ui/{s — (ui + • • • + Ufc)), and this is < 1 as required. Similarly, from the 
constraint for column j < k we find that = (n — k)uj/[s — {ui + • • • + ttfc)). We see that 
/ij = \j. Therefore we need only deal with the Aj. Substituting the values of the A, in the 
expression for i/, we finally arrive at the solution 





' UiUj 






^ 1 

[s - {ui + 


h Uk))Ui 




(n - 


-k)s 


Xij — < 


{s - (ui + 


h Uk))Uj 




(n - 


- k)s 




{s - (ui + 


■■■ + Uk)f 




(n- 


kfs 



ihj) e A, 



(7.1) 



where k is defined by (|4.3p . This solution is symmetric, and the reader can verify that 
it satisfies all the constraints. The A,B,C matrices have the gravity form, the B and C 
matrices are the transpose of one another, and the D matrix is constant. Finally, note 
that we did not assume the symmetry in the solution, it followed as a consequence of the 
symmetric information. 



7.2 Given row and column sums, partially fixed diagonal 

Assume that the sum of row and column i is Uj, and that the first jn ^ n of the diagonal 
elements are fixed to be 0. Then, with s = ui + ■ ■ ■ + Un still denoting the total sum, the 
matrix elements other than the first m on the diagonal must be given by 

Xij = sXijij, Xi, jij > 0. 

Including the factor s is a convenience, as will become clear. For the above solution to 
be possible the consistency condition (j6.ip must be satisfied: each ui must be strictly less 
than half of s. We shall assume this to be the case. If A is the sum of the Aj and that of 
the fXj, and = tii/s, the row and column constraints can be written as 

Xiin - fit) = Vi, i^m, \ifi = ri, i > m, ,^ 
Hj{X-Xj) = rj, i ^ m, ^jX = rj, j > m. 

Here ri + • • • + = 1 and < 1/2 for all i. Noting that ()7.2p is unchanged if we exchange 
the Xi and the /ij leads us to consider a solution with fii = Xi. Then (I7.2p reduces to 

Ai(A — Xi) = ri, i ^ m, AjA = r^, i > m. (7.3) 

(17. Sp implies that for i ^ m we have Aj = (Ait \/A^ — 4rj) /2, whereas for i > m, Xi = r^/A. 
Suppose we pick the root with the "— " for i = 1, . . . , m. Adding the expressions for the 
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Aj by sides and dividing both sides of the result by A / we see that A must satisfy the 
equation 



^l_4ri/A2 + ... + Vl-4r^/A2 - 2 ^"+^ + ' + = m - 2. (7.4) 

An exact analytical solution of (j7.4p is impractical, but we can find an approximation. To 
begin with, we observe that the l.h.s. of (j7.4|) is a monotone increasing function of A so 
the root of (j7.4p is uniqu^. Second, at the expense of restricting the somewhat, we can 
localize the root: 



Proposition 7. 1 Suppose that each of ri, . . . ,rn is in {0,1/3), and ri + • • • + r„ = 1 . Then 
for any n ^ 3 and any m ^ n, equation (T^) has a root in the interval (2-y/r\J^, 4/3), 
where rmax is the largest of the rj. 

To see the necessity for some additional restriction on the r^, suppose that m = n and 



that we extend (0, 1/3) to (0, 1/2). Then consider the set ri 

it can be seen that (j7.4p has no solution in (\/2, oo). 
In terms of the root A of (j7.4p the final solution is 



i and r2 



~ 2{n-l) ' 



4 

srj 

2 

sr,- 



1 - \/l - 4r,/A2 j (l - ^1 - 4rj/A2 

(i-yr^i^^A^), 

1 - - 4ri/A2Y 



^ srirj/X 



i,j ^ mandi 7^ j, 

i > m, j ^ m, 

i ^ m, j > m, 
i,j > m. 



(7.5) 



We see that the Xij for i, j ^ m have a product form, but, in general, the factors are not 
independent. We know from ^that irrespective of symmetry, the dependence disappears 
if we don't fix any diagonal elements. Fixing these elements imposes a global dependence 
as we saw in f|6j 

Example 3 We consider the two extreme cases m = n and m = 1. In addition, suppose 
that all the Ui are equal. 

First let m = n. Then all are 1/n and A = ^Jnjin — 1). From the first line of ()7.5p 
the matrix X has the form 

/ 1 1 ... 1 \ 
1 1 ... 1 



n(n — 1) 



V 1 1 1 



/ 



^This is also true because the solution to our strictly concave maximization problem is unique. 
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Compare this with the case where the diagonal is not fixed to 0, and the solution is 



Now let m = 1. This is the simplest possible case: we have a square matrix with all 
row and column sums equal, except that the single element d\\ is fixed to be 0. From ()7.4p 
we find A = (n — 1) j \J n{n — 2). From the last three lines of (|7.5p we see that now X is 



n{n — 1) 



1 



V 1 



1 



1 



1 \ 



n-2 
n-1 



n-2 
n-1 / 



Example 4 Now consider the case m = n, but with the rj arbitrary, 
analytical aproximation to the solution. If we let ^ = (j7.4p becomes 



V"! - + • • • + ^1 - 



n 



e G (9/4, l/r„ 



We obtain an 



(7.6) 



where the lower bound on ^ comes from Proposition 17. 1[ This equation has the form 
fiO ~ ^1 if ^0 is an approximation to its solution, the reversion technique in Ch. 1 
of |Hen88j can be used to find the following power series for ^: with pi = \/l — Vi^Q and 
<^ = /(?o) -c = pi-\ \- pn-n + 2, 



2 ^ rl/pf + ■■■+ rl/pl 

n/Pl H \-rn/Pn in/ Pi H h Tn/pnT 



6' 



(7.7) 



It is known that this series converges, and it can be shown that 6 < 1 for any £ 
(9/4, l/rmax)0- By (|7.5p the non-diagonal matrix elements are given by |(1 — Vl ~ ^iO(l ~ 
y'I — Tj^), and ()7.7p lets us find power series expansions for them in terms of 6. We do not 
show these series here, but the expansions to first order result in manageable expressions. 
The accuracy of the expansions remains to be investigated. 



Now consider a numerical example with u = (40, 20, 30, 40). We have ' 



so solving (US]) we find ^ ^ 2.88018 G 
2.25 + 0.749023 - 0.112889 



|,f ). If we take 
2.8861. Then the form 



.13' 13' 13' 13 y" 



9/4, d ZZD giv es e 

(1 - - ^/T^V~^) yields 



X 



( 7.59 12.59 19.82 \ 

7.59 4.82 7.59 

12.59 4.82 12.59 

V 19.82 7.59 12.59 / 



vs. 



/ 12.31 6.15 9.23 12.31 \ 

6.15 3.08 4.62 6.15 

9.23 4.62 6.92 9.23 

\ 12.31 6.15 9.23 12.31 / 



the MaxEnt matrix without the 0-diagonal constraint, whose elements are simply sriVj. 
As we also saw in Example El the result of fixing the diagonal to cannot be regarded as 
a (small) perturbation of the srirj form. 



""Note that pi < 1 - l/2rifo, so pi H V Pn < n - S,q/2. 
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Generalization (a) The solution (|7.5p is valid also when the Ui are upper bounds on 
the row sums, instead of specifying their values. In that case Corollarv lA.ll requires that 
Aj ^ 1, which is true if 

Vi < A ^ 2 or A > 2. 

But this holds by virtue of Proposition 17.11 

(b) The diagonal elements can be set to arbitrary values wu, . . . jWnn, if the rj are 
re-defined as {ui — Wii)/s. This actually requires a slight extension of Proposition I7.lt see 
Proposition 17.21 below. And it can be verified that if we set wu = ui/ s we get the expected 
solution Xij = sriVj. 

7.3 3-dimensional matrices with fixed diagonal 

The development of ^7.21 can be extended to 3-dimensional matrices. These can be thought 
of as contingency tables involving elements with 3 attributes, or as trip matrices where a 
trip is characterized by an origin and a destination as in the 2-dimensional case, and, in 
addition, by a class of vehicle, say, or as traffic matrices where traffic flows have origins, 
destinations, and a size class, such as "small", "medium", "large". Whatever the three 
attributes, we will index them hy i,j,k. We will consider the case where the whole diagonal 
is and the available information is the sums over all (i, k) sections and all {j, k) sections 
of the matrix: 

Vi '^Xijk = Uik, Vj y^^Xjjk = Vjk- 

In the case of a traffic matrix for example, this means that we know the total number of 
flows originating at i and of size class k, and the total number ending at j of size class k. 
The matrix elements will then be 

Xijk = sXik^J'jk for i 7^ j, and otherwise, 
where the \ik and ^jk are s.t. 

Mi s ^ Aifc/Ujfc = Uik, Vj s ^ Kkl^jk = Vjk- 

Now let this information be symmetric w.r.t i and j, i.e. Vik = Uik- Further, define 
i"ik = Uik/ s. Then the above constraints can be written as 

Ajfc(M.fe - ^J'ik) = rik, Vj fijkiX.k - Ajfc) = rjk, 

where the dot indicates summation over the corresponding index. Since the index j in the 
second set of constraints could have equally well been written i, we are led to consider a 
solution with /ijfe = Xik and the single set of constraints 

Vi Xik{X.k - Xik) = rik- 



19 



Proceeding as we did after (j7.3p . Aj^ = (A.^ — y ^\ — 4rjfc)/2. Adding these over i and 
setting = ^/\\, we arrive at a generalization of (j7.6|) : 

VA; v/l - rifc^fc H h \/l - r^fc^ = n-2. 

This is completely analogous to what we found in Example |31 except that here we have 
one equation for each of the ^fc. The final expression for the elements of the matrix is 

Xijk = ^(1 - \/l - r-ikik) (1 - ~ ^jkik) for i / j, and otherwise. 

Note that the matrix sections corresponding to different values of k are independent of one 
another. The above development generalizes to the case where only the first m < n of the 
diagonal elements are fixed, and in the other ways discussed in ^7.2i 

7.4 Given row and column sums, fixed diagonal blocks 

We generalize the development of §7.2l to equality constraints expressed by a block-diagonal 
matrix W with blocks Wi, . . . , Wm, m ^ 3. This means that the n nodes are partitioned 
into m sets Ii, . . . , Im, and the submatrix of X that has rows and columns in Ij is con- 
strained to equal Wj. So X looks like 





h 


h ■ 




h 


Wi 






h 

















where the rest of the entries are determined by the n-constraints and, as previously, are 
given by sXiXj. Thus for the nodes in the set Ii we have the equations 

sAi(sum of ^ h) = ui — (sum of first row of Wi), 
sA2(sum of Xj,j ^ Ii) = U2 — (sum of second row of Wi), 

etc. Let A/^ denote Yli^ii ^i-> ^^^"^ similarly for A/j, etc. Also let A = A/j -|- • • • -|- A/^. Then 
the above equations can be written as 

sAi(A - A/i) = ni - -wi/i, sA2(A - A/a) = U2 - W2/i, ••• 

where the meaning of the additional notation should be clear. If we now add these equations 
by sides, the result can be written compactly as 

A7i(A - A/J = r/^, where rj^ = (uj^ - -w/^/J/s, 
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and where subscripts that are sets indicate summation over the respective sets. If we do 
the same thing for the rows in I2, ■ ■ ■ , Im, we arrive at the system of equations 

A/, (A - A/J = r/j , A/2(A- A/2) = r/2, A7„(A - A/„) = r/„, 

which has exactly the form (j7.3p except that here the rj. don't sum to 1, but to 

^ m 

a = l--y.wij^ < 1. 

* i=i 

Of course, the UI^ and WIJ^ are assumed to satisfy the consistency condition (I6.ip . Pro- 
ceeding just as in m.2\ we have 



A/,=^(A-;/A2-4r,, 



so that A is the root of the equation 



'1 - 4rijX^ + ■■■ + - 4r7„/A2 = m-2, (7.8) 
about which we have a generahzation of Proposition 17.11 

Proposition 7.2 Suppose that r/j + • • • + r/^ = a < 1, and each rj. is in (0, cr/3). Then 
for m ^ 3 equation jV.S^ has a root in {2.,/^\^,4^/a/3), where rmax is the largest of the 



Given the root A of (|7.8p . if i G Ik, Xi is given by 2rj/(A + y^A^ — 4r/j, ) . But this 
expression also equals ri{X — y^A^ — Arj^, ) / (2r/j, ) . So the solution to our problem is: for 



s }?rjr 



X. 



4 r/.r/, 



l-Jl-4r,JA2)(l- Jl-4r,,/A2) 



n = ri^ = y ri = 

s 1^ g 

l€lk 

Suppose that all blocks are of size 1, so m = n and the constraints Then it 

is easily seen that (I7.9P gives the same result as (j7.5p . An analytical approximation to the 
solution of (j7.8p . and to the matrix elements themselves, can be found by the power series 

(ILZI). 

Finally, the solution ()7.9p holds even when the Ui are upper bounds on the row and 
column sums. In that case Corollary I A. 1 1 requires Xjj ^ 1, which holds if Vj, 2^Jri'. ^ A ^ 2. 
But this last condition obtains by virtue of Proposition 17.21 
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8 Conclusion 



Table 18.11 summarizes the problems for which we obtained results in this paper. We saw 
that the most likely/MAxENT matrices exhibit as much independence, symmetry, and 
uniformity as possible subject to the available information or constraints. Further, they are 
robust with respect to changes in the information/constraints. Lastly, given independent 
constraints on the rows and columns, the matrix elements have a "product of independent 
factors" form, unless some of them are fixed, in which case the independence disappears. 

Rectangular matrices/contingency tables 

Given row sums and some column sums 

Bounds on row sums 

Total sum and bounds on row sums 

Bounds on total sum and row sums 

Bounds on row and column sums 

Bounds on row sums and on individual elements 

Square matrices with symmetric information 

Total sum and bounds on row and column sums 

Given row sums and partially-fixed diagonal, 

with extension to 3d matrices 
Given row sums and fixed diagonal blocks 

Table 8.1: Summary of cases solved. 

The types of constraints that we considered were relatively simple, as befits an initial 
exploration of the space of analytical solutions. The aim was to have enough basic results to 
establish a framework for further investigations, perhaps motivated by constraints arising 
in concrete problems. 

Finally, even though we used the discrete balls-and-boxes framework throughout, all 
that is said in this paper applies also to deriving 2-dimensional discrete probability distribu- 
tions from incomplete information, if we think of the balls as "probability quanta" thrown 
into the boxes. Jaynes |Jay03| calls this the "Wallis derivation" of MaxEnt probability 
distributions. 

Acknowledgments Thanks to my colleagues Howard Karloff and N.J. A. Sloane for in- 
teresting and helpful discussions. 

A Auxiliary results and Proofs 

A.l Optimal solutions of concave programs 

We review some standard terminology and results. 
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Suppose C is a convex set in M". A concave program is the problem of maximizing a 
concave function / on this set, subject to a number of equahty and inequahty constraints: 

max^^cfix) subject to ,^ 
gi{x)=0, 1 = 1,. hj{x)^0, j = l,...,m, 

where the gi are linear on C (and assumed hnearly independent) and the hj are convex 
on C. All X £ C satisfying the constraints are called feasible. The Lagrangean function 
associated with the concave program (jA.ip is 

^>(x, a, /3) = fix) - aMx) - ^M^)- (A-2) 

i 3 

The following result (Theorem 2.30 in jADSZ88] . or §5.5.3 of [BV04]) gives sufficient condi- 
tions for solving a concave program in which all functions are differentiable on (the interior 
of) C: 

Theorem 1 If x* is feasible, and there are a*, (3* such that 

V^$(x*,a*,/3*) = 0, ^*hj{x*) = Q and /? * ^ Vj, 
then X* solves the concave program liA.l]) . 

Also recall that if a strictly concave function on a convex set has a maximum, the 
maximizing point is unique (Theorem 2.22 in |ADSZ88] ). 

Corollary A.l Suppose the function f in II A. ^) is the entropy, and all the constraints are 
linear and involve coefficients that are either or 1. Then the elements of x* have the 
form 

4 = n «^ n ^'^^^^ > o,/3j G (0, 1], 

where is the set of indices of the equalities gi in which x^ appears, and 1^ is the set 
of indices of the inequalities hj where x^ appears. The j-th inequality constraint can be 
satisfied either as a strict inequality or as an equality, and we must have 

hj{x*)ln(3'j =0. (A.3) 

Corollary A. 2 If the function f in II A. is the entropy difference function G of \5. 1\) 
and the constraints are as in Corollary \ A. li then 

E 4) n«^n/5i' a^>0,/3^G(0,l], 
where the /?' must satisfy IIA.3\) . 
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A.2 Proofs for gl 
Proof of Lemma 14.11 

The Lagrangean is 

$ = - ^Xjlnxj - Ai(xi -bi) A„(x„ - 6„) - fi{xi H h x„ - a). 

i 

Setting to 0, we have for all i 

Xi = e-^»-^-i ^ Xifi. (A.4) 

By Corollarv lA.il for a point (xi, . . . , x„) given by (|A.4p to solve the problem the following 
must hold 

a) {xi , . . . , must be feasible, 

b) By ()A.3p . we must have Aj S (0, 1] for all i, and {xi — bi) In Aj = 0. 

Now arrange the bi and Xi as stated in part (i) of the lemma. Consider the solution 

Xi = bi = Xifi, with Aj ^ 1, i = l,...,k . 
Xi = fi, with Aj = 1, i = k + l,...,n 

in accordance with (b) above, where k is as yet undetermined. Putting (jA.Sp into the 
equality constraint we get bi + ■ ■ ■ -\- b^ + {n — k) fi = a. It follows that 

a - {bi + ■■■ + bk) , . 

Xk+l = ■ ■ ■ = Xn = fJ- = r . (A. 6) 

Now let k be chosen as in part (ii) of the lemma. Then the solution (xi, . . . , x„) given by 
(jA.Sp . ()A.6P is feasible as required in (a) above: by the definition of k, 6i + • • • + bk+i + 
[n — k — l)bk+i > a, which is equivalent to fj, < fefc+i. 

To satisfy (b), we need to check that Aj ^ 1 for i = 1, . . . ,k. From (jA.SP and ()A.6p . 

j rCiU' 

= 7T—, — , u \ and Ai ^ 1 <^ a- {bi + ■■■ + bk) > (n- k)bi. 

a - {bi-\ \-bk) 

But this last condition holds Vi ^ A: by the definition of k. We have found a solution x* , 
and because the entropy function is strictly concave, this solution is unique and we are 
done. It remains to show that it is possible to find a A; as required in part (ii) of the lemma. 
This is done in Proposition lA. II below. 

Proposition A.l Given 6o = < 6i ^ 62 ^ • • • ^ o,nd < a ^ 61 + • • • + 6„, there is a 
k £ {0, . . . ,n} s.t. the inequality 

a - (61 H h 6j) ^ {n- j)bj 

holds for all j ^ k and for no larger j. 
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Proof Consider the function ip{j) = a — {bi + ■ ■ ■ + bj) — {n — j)bj, j G {0, 1, . . . , n}. It is 
easy to see that ip{j) ^ 93(^ + 1) for all j, so this function is monotone decreasing. Further, 
(p{0) = a > and ip{n) = a — (bi + ■ ■ ■ + bn) ^ 0. So there is a A; ^ n s.t. (p{i) ^ for 
j ^ k, and (/^(j) < for j > A;, as claimed. Note that A: = n iff 61 + • • • + 6„ = a. □ 

A.3 Proofs for g5] 
Proposition A. 2 The function 

G{xi,...,Xn) = (X^Xj) \n{y^^xi) - y^a^i - y^^jxjhiXi 

i ill 

= f Xj) In Xj) - a:^ In : 



^Xi 



is concave over the domain xi > 0, . . . , x„ > 0. 

This is probably known somewhere in the information theory literature, but I don't 
know where. So a proof is presented below. 

Proof By Theorem 2.14 of |ADSZ88j it suffices to show that n{x) = V^G{x), the Hessian 
of G, is negative semi-definite. We find 

nix) = — — ^— diag (—,..., — 

Xl + • • • + X'yi \X\ Xfl 

where C/„ is a matrix all of whose entries are 1, and for an arbitrary vector y = (yi, 
we must have y^T-L\ ^ 0. To establish this, first write Ti as 



7i{x) = ( Un — diag 



Xl H \-Xn Xl H \- Xr 



Xl -\- ' ' ' -\- Xji y \^ Xl x^ 

Now define = Xj/(xi + • • • + x„). The condition y'^'H] ^ is then equivalent to 

{yi + --- + ynf ^ yl/ii + --- + yl/in, (A.7) 

where the are positive and sum to 1. The truth of (1A.7P follows from the fact that 
yl/ii + ■ ■ ■ + yn/in is a convex function of ^1, . . . , ^„ over the domain ^1 > 0, . . . , ^„ > 0, 
and its minimum under the constraint ^i + • • • + = 1 occurs at Q = yi/{yi + ■ ■ ■ + y-n)- 
So the least value of the r.h.s. of ()A.7p as a function of .^i, . . . , is {yi + • • • + ynf'- 



Proof of Proposition 15.11 

We first give a straightforward proof assuming that Xjj G N. Suppose there is a matrix X 
s.t. for some i,j row i sums to less than Ui and column j to less than Vj. Further, let X 
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have total sum s. Consider the matrix X' formed by adding 1 to Xij. First, if X satisfies 
the constraints, so does X'. Second, #{X'\I)/#{X\I) = {s + l)/{xij + 1) > 1. Thus X' 
has more realizations than X, and so X cannot be the most likely matrix X. 

To give a proof assuming that the elements of X are non- negative reals, by Corollary 
IA.2I we must have Xij = ; Xki)Xiiij. Now if there is a pair i, j s.t. Ylj ^ij — Ui < and 

Xij — Vj < 0, we must have Xi = fij = 1, so Xij = J2k i ^ki- Thus all other elements of 
Xmust be 0. Further, Xij ^ mm{ui,Vj). But it is easy to see that this matrix cannot have 
the most realizations. 

Proof of Proposition 15.21 

Consider the function ip{£) = u — {vi + - ■ ■ + vi) — {n — i)vi^i. It is easy to check that (p{£) 
as i Z'. Further, V'(O) = u — nv\ ^ ^liujn ^ v\. Finally, ^{n — \) = u — (f i + • • • + < 0. 
Thus there is a least £, s.t. <0, l^^<n-l, and - 1) 0. Let that I be k. 

The two conditions (/?(A;) < and Lp{k — 1) ^ establish what is claimed. 

A.4 Proofs for 

The following result is a variation of Lemma I4.lt it says that the most likely vector with 
sum bounded by a and elements bounded by the vector h is the MaxEnt vector with sum 
equal to a and elements bounded by h. 

Lemma A.l The most likely vector x* = (xj, . . . , xj^) satisfying Vi ^ ^ 6j and 
xi + ■ ■ ■ + Xm ^ a, a, hi S N, is found as follows. If a ^ bi + ■ ■ ■ + bm, the inequality in this 
constraint can be replaced by equality and then x* is given by Lemma \4-1\ If a > • ■+bm, 
then X* = . . . , bm)- 

Proof First we reduce the problem in N to another problem in N. Suppose that a ^ 
bi+- ■ •+bm- Let y = {yi, . . . , ym),yi € N be the most likely vector summing to q ^ a— 1 and 
satisfying yi ^ 6j. Pick a yj s.t. yj < bj] this exists because y sums to a, which is strictly 
less than 6i + • • • + bm- But then the vector y' = {yi, . . . , yj-i,yj + 1, y^+i, . . . , ym) sums to 
a + 1, satisfies the 6-constraints, and by the argument given in ^ | a + 1) > #(y | a). 
So by increasing the allowed sum a we get a more likely vector. It follows that the most 
likely vector x* in N satisfying the constraints sums to exactly a, and (an approximation 
in M) can therefore be found by Lemma l4.ll 

Now let a > bi + ■■■ + bm- In that case the a-constraint is irrelevant and we have 
precisely the problem solved in ^for a matrix; so x* = . . . , bm)- D 
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A.5 Proofs for ^772] 

A. 5.1 Proof of Proposition [77T] 

We already noted that the function 

/(A) = Vl - 4ri/A2 + • • • + Vl - 4r^/A2 - 2(r^+i + • • • + r„)/A2 - (m - 2) 

is monotone increasing for any m ^ n. We will now show that /(4/3) > and /(2y^r^^) ^ 
0. 

/ > at 4/3 This reduces to showing that 

^1 - 9/4ri + • • • + Vl - 9/4r„ - 9/8(r„+i + • • • + r„) > m - 2. (A.8) 

The l.h.s. has the form ^i^i{ri) where (pi{-) is concave, so it is a concave function of 
ri, . . . , r„ (Prop. 2.16 of |ADSZ88] ) over the convex domain defined by ri + • • • +rn = 1 and 
< Tj ^ 1/3. Therefore its minimum occurs on the boundary of the domain ( |ADSZ88] . 
Prop. 2.25.) The boundary consists of all points s.t. three of the are 1/3 and the rest 
are 0. There are several cases to consider. First, it is easy to check that (lA.Sp holds for 
m = and m = 1. 

Next let m = 2. What we want to prove reduces to y^l — 9/4ri + y^l — 9/4r2 — 
9/8(r3 + • • • + Vn) > 0. The possibilities for the boundary are ri = r2 = rs = 1/3, or 
ri = 1/3, r3 = r4 = 1/3, or = = = 1/3, and the desired inequality holds under any 
of these conditions. 

Lastly suppose that m ^ 3, and, without loss of generality, that ri = r2 = = 1/3. 
Then (jA.SP becomes 3/2 + m — 3 > m — 2, which is true. Next, let ri = r2 = 1/3, 
Tm+i = 1/3; (jA.SP becomes 1 + m — 2 — 3/8 > m — 2, which is also true. The remaining 
two cases are ri = 1/3, r^+i = rm+2 = 1/3, and r^+i = rm+2 = rm+3 = 1/3, and (1A.8P 
holds for both. 

/ ^ at 2y/r^ax Without loss of generality we may assume that rmax = because this 
makes the notation simpler. Then f{2y/j\^) < reduces to establishing 

Vl - r2/n + . . . + ^1 - r^/n - ''"+^ + ^ m - 2. (A.9) 

We will find the maximum of the function on the l.h.s., treating ri as known for the 
moment. Using Theorem [H the l.h.s. is a concave function of ^2, . . . ,r„, and under the 
constraint r2 + • • • + r„ = 1 — ri it has a unique maximum at the point determined by 

J- = J-(i-!iY'^^^= ... = J_fi_!™r^/' 

2ri 2ri V ri / 2ri V ri J 
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Thus the maximum occurs at the point ^2 = • • • = = and r^+i + • • • + r„ = 1 — ri, 
where the value of the function is m — 1 — (1 — ri)/(2ri). Therefore ()A.9p will hold iff 
ri ^ 1/3. 

The above proof assumed that m < n. When m = n, (|A.9p becomes 

V^r^^V^ + • • • + yr^Wn ^ - 2. (A.IO) 

As before, the l.h.s. is a concave function for fixed ri, and its maximum occurs at r2 = 
. . . = r„ = (1 - ri)/(n - 1). Thus (TOOD holds if 

(n-l)Jl- ^ n - 2, 

which is true if ri ^ 1/3. 



Proof of Proposition 17.21 We re-use the proof of Proposition 17.11 Define /(A) = 
\/l — ^ri^/\^ + • • • + Y^l — ^ri^ I y? — (m — 2). Setting pi = rjja, this becomes 

/(A) = Vl - 4a/>i/A2 + . . . + y/l - 4ap„/A2 - (m - 2), 5^/>i = l. 



Then /(4^/3) > is equivalent to ^1 - 9/4pi H h ^1 - 9/4pm > m - 2; but this is 

a special case of (jA.Sp . It remains to show that f{2^/r\^) = f(2^apraax) ^ 0. Assuming 
w.l.o.g. that pmax = Pi, this reduces to sJX — pij p\ + • • • + \/'^ — P2/ Pm ^ m — 2, which 
follows from (jA.lOp . 
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